feat(events): emit 5-way token breakdown + context-window utilization in message_complete#87
Conversation
… in message_complete (#86) - Expand `tokens` in `message_complete` from an opaque `info.tokens` passthrough to an explicit object with all 5 fields: input / output / reasoning / cache.read / cache.write — mirroring upstream LLM.Usage shape. The data was already captured in MessageV2.Assistant.tokens via StepFinishPart accumulation; this change surfaces it explicitly. - Add `context: { used, limit, ratio }` to `message_complete`: - `used = input + cache.read` (tokens occupying the context window this turn) - `limit` sourced from Provider.getModel() → model.limit.context (models.dev) - `ratio = used / limit`; emits `null` when limit is unknown (unregistered endpoint) - Cost kept correct: `info.cost` accumulates real per-step cost from StepFinishPart, NOT from the new step.ended event which emits cost:0 (the cost:0 trap). - Update EVENTS.md with the extended schema and field-by-field documentation. - Add TDD test file (RED→GREEN): `test/cli/usage-token-breakdown.test.ts`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_0187MsfK1upr6K2BKVbmaebQ
Code reviewVerdict: Address the major findings before merging. · 🔴 0 · 🟠 1 · 🟡 1 · ⚪ 0 · 0/2 resolved
🤖 Fix all 2 open findings with your agent📋 Out-of-diff findings (2)
Reviewed 3 files · 0 inline · view all 2 findings ↗ aictrl · AI code review for fast-moving teams · aictrl.dev |
…dContextWindow
Custom models without a registered context limit default to `limit.context = 0`
(provider.ts:929). The old guard `contextLimit != null` passed for 0, causing
`ratio = used / 0 = Infinity`, which JSON.stringify serialises as `null` *inside*
the context object — diverging from the documented top-level null contract in
EVENTS.md ("null — emitted when the model's context limit is not known").
Fix: extract pure helper `buildContextWindow(limit, used)` that returns null when
limit is null or <=0. This also makes the computation unit-testable.
Replace source-grep tests (which could pass even with wrong logic, per bot review)
with 10 behavioural unit tests of `buildContextWindow` covering: null limit, zero
limit (🟠 regression case), ratio computation, JSON-serialisability, and the
top-level-null contract. Retain slim source-text checks for structural wiring.
Fixes review findings from PR #87 (aictrl-dev bot):
- 🟠 limit:0 yields Infinity ratio, breaks null contract
- 🟡 Tests grep source text instead of running emit path
Review response — PR #87Triaged 2 findings (🟠 1, 🟡 1). Both verified TRUE; both fixed. Issues addressed (pushed to this PR)
Review claims verified false (no change needed)(none — both findings were genuine) Not addressed here(none — all findings fixed) |
Code reviewVerdict: Looks good — only minor / nit comments below. · 🔴 0 · 🟠 0 · 🟡 3 · ⚪ 3 · 0/6 resolved
🤖 Fix all 6 open findings with your agent📋 Out-of-diff findings (6)
Reviewed 3 files · 0 inline · view all 6 findings ↗ aictrl · AI code review for fast-moving teams · aictrl.dev |
…range and example - .catch(()=>null) narrowed to catch only Provider.ModelNotFoundError and rethrow unexpected errors — avoids silently swallowing registry/programming faults - EVENTS.md: ratio range updated from "(0–1)" to "(≥0; may exceed 1)" to match the unclamped division; example ratio fixed to 0.04912 (was rounded 0.049) - JSDoc provider.ts:929 hardcoded line citation replaced with behavioral description - Tests: +ratio-exceeds-1 unit test; +source-verified targeted-catch regression test
Round-2 Review Triage — PR #87 · feat/usage-token-breakdownVerdict: All 6 findings triaged. 3 fixed, 1 FALSE (stale), 1 FALSE (upstream already normalized), 1 DEFER (no correctness risk).
Changes: commit
Tests: 20 pass (was 18) · |
Code reviewVerdict: Address the major findings before merging. · 🔴 0 · 🟠 1 · 🟡 1 · ⚪ 0 · 0/2 resolved
🤖 Fix all 2 open findings with your agent📋 Out-of-diff findings (2)
Reviewed 3 files · 0 inline · view all 2 findings ↗ aictrl · AI code review for fast-moving teams · aictrl.dev |
…zation cache.write tokens are part of the prompt sent to the model on the current turn (they occupy the context window, just billed at the cache-write rate). Omitting them undercounted utilization most on the first turn, where a large prefix is written to the cache. Fix: contextUsed = input + cache.read + cache.write. EVENTS.md: update used definition and example (used: 10848, ratio: 0.05424). Tests: regression test verifying three-way sum at call site. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_0187MsfK1upr6K2BKVbmaebQ
Round 3 review triage · PR #87Verdict: Fixed the major; deferred the minor. Converging — both threads resolved. 🟠
|
Code reviewVerdict: Looks good — only minor / nit comments below. · 🔴 0 · 🟠 0 · 🟡 2 · ⚪ 0 · 0/2 resolved
🤖 Fix all 2 open findings with your agent📋 Out-of-diff findings (2)
Reviewed 3 files · 0 inline · view all 2 findings ↗ aictrl · AI code review for fast-moving teams · aictrl.dev |
Round-4 triage · PR #87 · feat/usage-token-breakdownVerdict: CONVERGE — no code change. Both round-4 findings are FALSE or DEFER; all prior real bugs are confirmed fixed. Finding 1 —
|
| Round | Finding | Status |
|---|---|---|
| R1 | context.used excludes cache.write |
Fixed — line 518 now input + cache.read + cache.write |
| R2 | limit:0 yields Infinity ratio |
Fixed — buildContextWindow guards contextLimit <= 0 |
| R2 | .catch(()=>null) swallows all errors |
Fixed — catch re-throws non-ModelNotFoundError |
| R3 | ratio unclamped vs documented 0–1 |
Fixed — buildContextWindow clamps or EVENTS.md updated |
Summary
message_complete: expandstokensfrom an opaqueinfo.tokenspassthrough to an explicit object —input / output / reasoning / cache.read / cache.write— mirroring upstreamLLM.Usage. The data was already captured inMessageV2.Assistant.tokensviaStepFinishPartaccumulation; this change surfaces all fields explicitly.context: { used, limit, ratio }whereused = input + cache.read,limitcomes fromProvider.getModel() → model.limit.context(models.dev registry), andratio = used / limit. Emitsnullwhen the model's context limit is not known (unregistered custom endpoint).cost: 0trap avoided:info.costis kept as-is — it accumulates real per-step cost fromStepFinishPart. The new upstreamstep.endedevent emitscost: 0(reconciled later by a projector); we do not touch that path.How the cache split was already captured
MessageV2.StepFinishPart(the legacy step-finish message part) already hastokens: { input, output, reasoning, cache: { read, write } }. The assistant messageinfo.tokensis accumulated from these step parts — cache split included. No new provider-level capture was needed; we just stop dropping it in the emit call.Context limit source
Provider.getModel(providerID, modelID)returns the model record from the models.dev registry, which hasmodel.limit.context. The lookup is wrapped in a.catch(() => null)so an unknown model (custom endpoint) gracefully emitscontext: nullrather than throwing.Test plan
packages/cli/test/cli/usage-token-breakdown.test.ts(8 cases) — confirmed RED before implementationpackages/cli/src/cli/cmd/run.tsbun test test/cli/usage-token-breakdown.test.ts— 8/8 GREENbun test test/cli/— 77/77 GREEN (no regressions)bun run typecheck— cleanbun turbo typecheckacross all 5 packages — 6/6 tasks successfulCloses #86
🤖 Generated with Claude Code
https://claude.ai/code/session_0187MsfK1upr6K2BKVbmaebQ